3. Stochastic Processes 

We learn in kindergarten about the phenomenon of Brownian motion, the random 
jittery movement that a particle suffers when it is placed in a liquid. Famously, it is 
caused by the constant bombardment due to molecules in the surrounding the liquid. 
Our goal in this section is to introduce the mathematical formalism that allows us to 
model such random behaviour. 

3.1 The Langevin Equation 

In contrast to the previous section, we will here focus on just a single particle. However, 
this particle will be sitting in a background medium. If we know the force F acting on 
the particle, its motion is entirely deterministic, governed by 

mi = —71 + F (3.1) 

In contrast to the previous section, this is not a Hamiltonian system. This is because 
we have included a friction term with a coefficient 7. This arises due to the viscosity, 77, 
of the surrounding liquid that we met in the previous section. If we model the particle 
as a sphere of radius a then there is a formula due to Stokes which says 7 = 6^770. 
However, in what follows we shall simply treat 7 as a fixed parameter. In the presence 
of a time independent force, the steady-state solution with x = is 

x — — F 

7 

For this reason, the quantity I/7 is sometimes referred to as the mobility. 

Returning to (3.1), for any specified force F, the path of the particle is fully deter- 
mined. This is seemingly at odds with the random behaviour observed in Brownian 
motion. The way in which we reconcile these two points is, hopefully, obvious: in 
Brownian motion the force F that the particle feels is itself random. In fact, we will 
split the force into two pieces, 

F = - W + f(t) 

Here V is a fixed background potential in which the particle is moving. Perhaps V 
arises because the particle is moving in gravity; perhaps because it is attached to a 

— # 

spring. But, either way, there is nothing random about V. In contrast, f(t) is the 
random force that the particle experiences due to all the other atoms in the liquid. It is 
sometimes referred to as noise. The resulting equation is called the Langevin equation 

m $ = _ 7 f _ W + fit) (3.2) 
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Although it looks just like an ordinary differential equation, it is, in fact, a different 
beast known as a stochastic differential equation. The reason that it's different is that 
we don't actually know what f(t) is. Yet, somehow, we must solve this equation 
anyway! 

Let's clarify what is meant by this. Suppose that you did know the microscopic 
force f(t) that is experienced by a given particle. Then you could, in principle, go 
ahead and solve the Langevin equation (3.2). But the next particle that you look at 
will experience a different force f(t) so you'll have to solve (3.2) again. And for the 
third particle, you'll have to solve it yet again. Clearly, this is going to become tedious. 

— # 

What's more, it's unrealistic to think that we will actually know fit) in any specific 
case. Instead, we admit that we only know certain crude features of the force fit) 
such as, for example, its average value. Then we might hope that this is sufficient 
information to figure out, say, the average value of x(t). That is the goal when solving 
the Langevin equation. 

3.1.1 Diffusion in a Very Viscous Fluid 

We start by solving the Langevin equation in the case of vanishing potential, V = 
0. (For an arbitrary potential, the Langevin equation is an unpleasant non-linear 
stochastic differential equation and is beyond our ambition in this course. However, we 
will discuss some properties of the case with potential is the following section when we 
introduce the Fokker-Planck equation). We can simplify the problem even further by 
considering Brownian motion in a very viscous liquid. In this case, motion is entirely 
dominated by the friction term in the Langevin equation and we ignore the inertial 
term, which is tantamount to setting m = 0. 

When m = 0, we're left with a first order equation, 



At this point, we can't go any further until we specify some of the properties of the 
noise fit). Our first assumption is that, on average, the noise vanishes at any given 
time. We will denote averages by ( • ) , so this assumption reads 



m = - /to 



For any fit), this can be trivially integrated to give 




(3.3) 



</(*)> = o 



(3.4) 
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Taking the average of (3.3) then gives us the result: 



(*(*)>= *(0) 



The is deeply unsurprising: if the average noise vanishes, the average position of the 
particle is simply where we left it to begin with. Nonetheless, it's worth stressing that 
this doesn't mean that all particles sit where you leave them. It means that, if you drop 
many identical particles at the origin, x(0) = 0, they will all move but their average 
position — or their centre of mass — will remain at the origin. 

We can get more information by looking at the variance of the position, 



This will tell us the average spread of the particles. We can derive an expression for 
the variance by first squaring (3.3) and then taking the average, 



In order to compute this, we need to specify more information about the noise, namely 
its correlation function ( fiitijfjfa) ) where we have resorted to index notation, i,j = 
1,2,3 to denote the direction of the force. This is specifying how likely it is that the 
particle will receive a given kick fj at time t 2 given that it received a kick at time t±. 

In many cases of interest, including that of Brownian motion, the kicks imparted by 
the noise are both fast and uncorrelated. Let me explain what this means. Suppose 
that a given collision between our particle and an atom takes time r co ii. Then if we 
focus on time scales less than r co ii then there will clearly be a correlation between the 
forces imparted on our particle because these forces are due to the same process that's 
already taking place. (If an atom is coming in from the left, then it's still coming in 
from the left at a time t <C r co u later). However if we look on time scales t ^> r co n, the 
force will be due to a different collision with a different atom. The statement that the 
noise is uncorrelated means that the force imparted by later collisions knows nothing 
about earlier collisions. Mathematically, this means 



The statement that the collisions are fast means that we only care about time scales 
t 2 — ti ^> t co11 and so can effectively take the limit r coll — > 0. However, that doesn't 



((f(t)-f(O)) 2 ) 




(3.5) 



( fi(ti)fj(t 2 ) ) = when t 2 - h > r coll 
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quite mean that we can just ignore this correlation function. Instead, when we take 
the limit r co ii — > 0, we're left with a delta-function contribution, 

( fi(ti)fj(h) > = 2/V 6ij 5(t 2 - h) (3.6) 

Here the factor of 7 2 has been put in for convenience. We will shortly see the inter- 
pretation of the coefficient D, which governs the strength of the correlations. Noise 
which obeys (3.4) and (3.6) is often referred to as white noise. It is valid whenever the 
environment relaxes back down to equilibrium much faster than the system of interest. 
This guarantees that, although the system is still reeling from the previous kick, the 
environment remembers nothing of what went before and kicks again, as fresh and 
random as the first time. 

Using this expression for white noise, the variance (3.5) in the position of the particles 

is 

((x(t)-x(0)) 2 } = 6Dt (3.7) 

This is an important result: the root-mean square of the distance increases as \/i with 
time. This is characteristic behaviour of diffusion. The coefficient D is called the 
diffusion constant. (We put the factor of 7 2 in the correlation function (3.6) so that 
this equation would come out nicely). 

3.1.2 Diffusion in a Less Viscous Liquid 

Let's now return to the Langevin equation (3.2) and repeat our analysis, this time 
retaining the inertia term, so m ^ 0. We will still set V = 0. 

As before, computing average quantities — this time both velocity ( x(t) ) and posi- 

— * 

tion (x(t) ) is straightforward and relatively uninteresting. For a given /(£), it is not 
difficult to solve (3.2). After multiplying by an integrating factor e 7 */" 1 , the equation 
becomes 




which can be happily integrated to give 

kt) = 'x [Q)e-^ m + — f dt' fit') e^'-'V™ (3.8) 
m J 

We now use the fact that the average of noise vanishes (3.4) to find that the average 
velocity is simply that of a damped particle in the absence of any noise, 

(£(*)) = f(0)e- 7 */ m 
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Similarly, to determine the average position we have 

x{t) = f(0) + f dt' 'x(t') (3.9) 
Jo 

From which we get 

(£(*)) =f(0)+ f dt' (£(f)> 
Jo 

= f(0) + — f(0) (l-e-T*/ m ) 

Again, this is unsurprising: when the average noise vanishes, the average position of 
the particle coincides with that of a particle that didn't experience any noise. 

Things get more interesting when we look at the expectation values of quadratic 
quantities. This includes the variance in position ( x(t) ■ x(t) ) and velocity ( x(t) ■ x(t) ), 
but also more general correlation functions in which the two quantities are evaluated at 
different times. For example, the correlation function ( Xi{ti)xj(t 2 ) ) tells us information 
about the velocity of the particle at time t 2 given that we know where its velocity at 
time t\. From (3.8), we have the expression, 

(iiitjijih)) = (x t (t 1 ))(x J (t 2 ) ) + — 2 f 1 dt[ T dt' 2 (fMfM)) e*W-*-*>' m 

m Jo Jo 

where we made use of the fact that {fit)) = to drop the terms linear in the noise 
/. If we use the white noise correlation function (3.6), and assume t 2 > > 0, the 
integral in the second term becomes, 

{iiitjxjih)) = {± i {t 1 )){x j {t 2 )) + '^8 ij e-^ +t ^ m r dt' e^'l m 

m Jo 

= { Xi ( tl )){x j (t 2 )) + ^5 ij (e^-*!)/"* _ e -7(ti+ta)/m) 

For very large times, t±,t 2 — > oo, we can drop the last term as well as the average 
velocities since ( x(t) ) — > 0. We learn that the correlation between velocities decays 
exponentially as 

(±i(fi)±i(* 2 )> -> — 5 lje -^~^ m 
m 

This means that if you know the velocity of the particle at some time t\, then you can 
be fairly confident that it will have a similar velocity at a time t 2 < £1 + 771/7 later. 
But if you wait longer than time 771/7 then you would be a fool to make any bets on 
the velocity based only on your knowledge at time t\. 



- 56 - 



Finally, we can also use this result to compute the average velocity-squared (which, 
of course, is the kinetic energy of the system). At late times, the any initial velocity 
has died away and the resulting kinetic energy is due entirely to the bombardment by 
the environment. It is independent of time and given by 

(f(t) • f (t) ) = — (3.10) 
m 

One can compute similar correlation functions for position (xi(ti)xj(t2) ). The ex- 
pressions are a little more tricky but still quite manageable. (Combining equations 
(3.9) and (3.8), you can see that you will a quadruple integral to perform and figuring 
out the limits is a little fiddly). At late times, it turns out that the variance of the 
position is given by the same expression that we saw for the viscous liquid (3.7), 

((x(t) -x(0)) 2 ) =6Dt (3.11) 

again exhibiting the now-familiar y/i behaviour for the root-mean-square distance. 

3.1.3 The Einstein Relation 

We brushed over something important and lovely in the previous discussion. We com- 
puted the average kinetic energy of a particle in (3.10). It is 

1 3 

E = —mix • x) — -D'j 

But we already know what the average energy of a particle is when it's bombarded by 
its environment: it is given by the equipartition theorem and, crucially, depends only 
on the temperature of the surroundings 

E = \k B T 

It must be therefore that the diffusion constant D is related to the mobility I/7 by 

D = ^ (3.12) 

7 

That's rather surprising! The diffusion constant captures the amount a particle is 
kicked around due to the background medium; the mobility expresses the how hard it 
is for a particle to plough through the background medium. And yet they are related. 
This equation is telling us that diffusion and viscosity both have their microscopic origin 
in the random bombardment of molecules. Notice that D is inversely proportional to 
7. Yet you might have thought that the amount the particle is kicked increases as the 
viscosity increases. Indeed, looking back at (3.6), you can see that the amount the 
particle is kicked is actually proportional to -D7 2 ~ T7. Which is more in line with our 
intuition. 
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Equation (3.12) is known as the Einstein relation. It is an important example of 
the fluctuation-dissipation theorem. The fluctuations of the particle as it undergoes 
its random walk are related to the drag force (or dissipation of momentum) that the 
particle feels as it moves through the fluid. 

The Einstein relation gives an excellent way to determine Boltzmann's constant ex- 
perimentally. Watch a particle perform a Brownian jitter. After time t, the distance 
travelled by the particle (3.7) should be 

(x 2 ) = ^t 
nr/a 

where we have used the Stokes formula 7 = Q-wqa to relate the mobility to the viscosity 
\x and radius a of the particle. This experiment was done in 1909 by the French physicist 
Jean Baptiste Perrin and won him the 1926 Nobel prize. 

3.1.4 Noise Probability Distributions 

So far, we've only needed to use the two pieces of information about the noise, namely, 

(/(*)>= (3.13) 
( fi(ti)fj(t 2 ) ) = 2 J D 7 2 M(t 1 - h) (3.14) 

However, if we wanted to compute correlation functions involving more than two ve- 
locities or positions, it should be clear from the calculation that we would need to 
know higher moments of the probability distribution for f(t). In fact, the definition of 
white noise is that there are no non-trivial correlations other than ( fi(ti)fj(t 2 ) ). This 
doesn't mean that the higher correlation functions are vanishing, just that they can be 
reduced to the two-time correlators. This means that for N even, 

( fh(ti) ■ ■ ■ fi N (t N ) ) = (f h (t)f i2 (t 2 ) ) ... (/i^.^tiv-O/ijv^Jv) ) + permutations 
while, for N odd 

(f h (h)...f iN (t N )) = 

Another way of saying this is that all but the second cumulant of the probability 
distribution vanish. 

Instead of specifying all these moments of the distribution, it is often much more 
useful to specify the probability distribution for f(t) directly. However, this is a slightly 
subtle object because we want to specify the probability for an entire function fit), 
rather than a single random variable. This means that the probability distribution 
must be a functional: you give it a function fit) and it spits back a number which, in 
this case, should be between zero and one. 
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The good news is that, among the class of probability distributions over functions, 
the white noise distribution is by far the easiest! If we were dealing with a single 
random variable, the distribution that has only two-point correlators but no higher is 
the Gaussian. And, suitably generalised, this also works for our functional probability 
distribution. The probability distribution that gives white noise is 



Prob[/(*)] = A/"exp ^- J 



where Af is a normalization factor which is needed to ensure that the sum over all 

— # 

probabilities gives unity. This "sum" is really a sum over all functions f(t) or, in other 
words, a functional integral. The normalization condition which fixes M is then 

j Df(t) Prob[/(i)] = 1 (3.15) 

With this probability distribution, all averaging over the noise can now be computed 
as a functional integral. If you have any function g(x), then its average is 



{g{x))=M J Df(t) g(x f )e-f dt ^ 4 ^ 2 



where the notation Xf means the solution to the Langevin equation in the presence of 
a fixed source /. 

Let's now show that the Gaussian probability distribution indeed reproduces the 
white noise correlations as claimed. To do this, we first introduce an object Z[J(t)} 
known as a generating function. We can introduce a generating function for any prob- 
ability distribution, so let's keep things general for now and later specialise to the 
Gaussian distribution. 

// r+oo 
Df(t) Prob[/(t)] exp (j dt J(t) ■ f(t) 

This generating function is a functional: it is a function of any function J(t) that we 
care to feed it. By construction, Z[0] = 1, courtesy of (3.15). 

As the notation Z suggests, the generating function has much in common with the 
partition function that we work with in a first course of statistical mechanics. This is 
most apparent in the context of statistical field theories where the generating function 
is reminiscent of the partition function. Both are functional, or path, integrals. These 
objects are also important in quantum field theory where the names partition function 
and generating function are often used synonymously. 
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The function J that we have introduced is, in this context, really little more than 
a trick that allows us to encode all the correlation functions in Z[J]. To see how this 
works. Suppose that we differentiate Z with respect to J evaluated at some time t — ti 

— * 

and then set J = 0. We have 
5Z 



SJi(h) 



Df(t)f i (t 1 )PTOb\f(t)] = (fi(tl 



J=0 

Playing the same game, first taking n derivatives, gives 
5 n Z 



SJi^tJSJizfe) ■ ■ ■ Ji n (tn) 



= J Dfit) fMfM . . . f in (t n ) prob[/(i)] 

= (fMfM.-.fiM) 



So we see that if we can compute Z[J], then successive correlation functions are simply 

— * 

the coefficients of a Taylor expansion in J. This is particularly useful for the Gaussian 
distribution where the generating function is, 

Z[J(t)} =AfJ Dfit) exp {- J + ^dt f^f® - J(t) ■ fit) 

But this is nothing more than a Gaussian integral. (Ok, it's an infinite number of 
Gaussian integrals because it's a functional integral. But we shouldn't let that phase 
us). We can easily compute it by completing the square 

Z[J{t)\ = N j Dfit) exp (-^j £°°dt [f(t) - 2D 7 2 Jit)] 2 - 4L>V J(i) • f(t)j 

— » — * _ — * 

After the shift of variable, / — > / — 2D7 J, the integral reduces to (3.15), leaving 
behind 



Z[J(t)\ = exp ^ 7 2 J dt Jit) ■ J(t)j 



Now it is an easy matter to compute correlation functions. Taking one derivative, we 
have 

5Z 



2DY Ji{h) Z[J] 



SMh) 

But this vanishes when we set J = 0, in agreement with our requirement (3.13) that 
the average noise vanishes. Taking a second derivative gives, 

SMhW.ih) = 2D ^ 5( - h +U?1*Mt 1 )J j (t 2 )Z[jl 

Now setting J = 0, only the first term survives and reproduces the white noise corre- 
lation (3.14). One can continue the process to see that all higher correlation functions 
are entirely determined by (/j fj ). 
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3.1.5 Stochastic Processes for Fields 

Finally, it's worth mentioning that Langevin-type equations are not restricted to par- 
ticle positions. It is also of interest to write down stochastic processes for fields. For 
example, we may want to consider a time dependent process for some order parameter 
m(f,t), influenced by noise 

dm 2 2 
— — = cv m — am — 2bm + t 
dt J 

where f(r,t) is a random field with correlations (/) = and 

( f{r u h)f(r 2 , t 2 ) ) ~ 5 d {n - f 2 )<^i - t 2 ) 

A famous example of a stochastic process is provided by the fluctuating boundary 
between, say, a gas and a liquid. Denoting the height of the boundary as h(f,t), the 
simplest description of the boundary fluctuations is given by the Edwards- Wilkinson 
equation, 

A somewhat more accurate model is given by the Kardar-Parisi-Zhang equation, 

^ = V 2 h + \(Vhf + f 

We won't have anything to say about the properties of these equations in this course. 
An introduction can be found in the second book by Kardar. 

3.2 The Fokker-Planck Equation 

Drop a particle at some position, say Xq at time to. If the subsequent evolution is noisy, 
so that it is governed by a stochastic Langevin equation, then we've got no way to know 
for sure where the particle will be. The best that we can do is talk about probabilities. 
We will denote the probability that the particle sits at x at time t as P(x,t; x ,t ). 

In the previous section we expressed our uncertainty in the position of the particle 
in terms of correlation functions. Here we shift perspective a little. We would like to 
ask: what probability distribution P(x, t; xq, to) would give rise to the same correlation 
functions that arose from the Langevin equation? 

We should stress that we care nothing about the particular path x(t) that the particle 
took. The probability distribution over paths would be a rather complicated functional 
(rather like those we saw in Section 3.1.4). Instead we will ask the much simpler 
question of the probability that the particle sits at x at time t, regardless of how it got 
there. 
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It is simple to write down a formal expression for the probability distribution. Let's 
denote the solution to the Langevin equation for a given noise function / as Xf. Of 
course, if we know the noise, then there is no uncertainty in the probability distribution 
for x. It is simply P(x,t) = 5(x — xf). Now averaging over all possible noise, we can 
write the probability distribution as 

P(x,t) = {6(x-x f )) (3.16) 

In this section, we shall show that the P(x, t) obeys a simple partial differential equation 
known as the Fokker-Planck equation. 

3.2.1 The Diffusion Equation 

The simplest stochastic process we studied was a particle subject to random forces in 
a very viscous fluid. The Langevin equation is 

m = l - m 

In Section 3.1.1 we showed that the average position of the particle remains unchanged: 
if x(t = 0) = then (x(t)) = for all t. But the variance of the particle undergoes a 
random walk (3.7), 

(x(t) 2 ) = QDt (3.17) 

For this simple case, we won't derive the probability distribution: we'll just write it 
down. The probability distribution that reproduces this variance: it is just a Gaussian 

/ 1 \ 3/2 

where the factor out front is determined by the normalization requirement that 

/AP(x, t ) = l 

for all time t. Note that there is more information contained in this probability dis- 
tribution that the just the variance (3.17). Specifically, all higher cumulants vanish. 
(This means, for example, that (x 3 ) = and ( x 4 ) = 3( x 2 ) and so on). But it simple 
to check that this is indeed what arises from the Langevin equation with white noise 
described in Section 3.1.4. 
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The probability distribution (3.18) obeys the diffusion equation, 

This is the simplest example of a Fokker-Planck equation. However, for more com- 
plicated versions of the Langevin equation, we will have to work harder to derive the 
analogous equation governing the probability distribution P. 

3.2.2 Meet the Fokker-Planck Equation 

Let's now consider the a more general stochastic process. We'll still work in the viscous 
limit for now, setting m = so that we have a first order Langevin equation, 

7 £=-W + / (3.19) 

A quadratic V corresponds to a harmonic oscillator potential and the Langevin equation 
is not difficult to solve. (In fact, mathematically it is the same problem that we solved 
in Section 3.1.2. You just have to replace x — v — > x). Any other V gives rise to a non- 
linear stochastic equation (confusingly sometimes called "quasi-linear" in this context) 
and no general solution is available. Nonetheless, we will still be able to massage this 
into the form of a Fokker-Planck equation. 

We begin by extracting some information from the Langevin equation. Consider a 
particle sitting at some point x at time t. If we look again a short time 8t later, the 
particle will have moved a small amount 

i i ft+St 

5x='x5t = — VV5t + - dt'f(t') (3.20) 
7 7 Jt 

Here we've taken the average value of the noise function, /, over the small time interval. 
However, we've assumed that the displacement of the particle Sx is small enough so 
that we can evaluate the force W at the original position x. (It turns out that this 
is ok in the present context but there are often pitfalls in making such assumptions in 
the theory of stochastic processes. We'll comment on one such pitfall at the end of this 
Section). We can now compute the average. Because (fit)) = 0, we have 

(Sx) = --VVSt (3.21) 

7 



The computation ( Sxi Sxj) is also straightforward, 

/t+st 
dt'idiVfj^ + djVfiit')) 



/t+St rt+St 
<tej dt"(f t (t')j)(t")) 
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Both the first two terms are order St 2 . However, using (3.6), one of the integrals in 
the third term is killed by the delta function, leaving just one integral standing. This 
ensures that the third term is actually proportional to St, 

( SxiSxj ) = 28ijD St + 0(5t 2 ) (3.22) 

We will ignore the terms of order St 2 . Moreover, It is simple to see that all higher 
correlation functions are higher order in St. For example, (x 4 ) ~ St 2 . These too will 
be ignored. 

Our strategy now is to construct a probability distribution that reproduces (3.21) 
and (3.22). We start by considering the conditional probability P(x,t + St;x',t) that 
the particle sits at x at time t + St given that, a moment earlier, it was sitting at x' . 
From the definition (3.16) we can write this as 

P(x, t + St; x', t) = ( S(x - x ' - Sx) ) 

where Sx is the random variable here; it is the distance moved in time St. Next, 
we do something that may look fishy: we Taylor expand the delta-function. If you're 
nervous about expanding a distribution in this way, you could always regulate the delta 
function in your favourite manner to turn it into a well behaved function. However, 
more pertinently, we will see that the resulting expression sits inside an integral where 
any offending terms make perfect sense. For now, we just proceed naively 

P(x,t + St- 2',t) = (l + (6x i )^ + ±( Sx % Sx, ) J^. + . . . W - (3-23) 

We have truncated at second order because we want to compare this to (3.27) and, as 
we saw above, ( Sx) and ( Sx 2 ) are the only terms that are order St. 

We now have all the information that we need. We just have to compare (3.27) and 
(3.23) and figure out how to deal with those delta functions. To do this, we need one 
more trick. Firstly, recall that our real interest is in the evolution of the probability 
P(x, t; xo, to), given some initial, arbitrary starting position x{t = t ) = xo- There is an 
obvious property that this probability must satisfy: if you look at some intermediate 
time t < t' < t, then the particle has to be somewhere. Written as an equation, this 
"has to be somewhere" property is called the Chapman-Kolmogorov equation 

/+oo 
d 3 x P(x , t; x', t') P(x', t'; x , t ) (3.24) 
-oo 

Replacing t by t + St, we can substitute our expression (3.23) into the Chapman- 
Kolmogorov equation, and then integrate by parts so that the derivatives on the delta 
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function turn and hit P(x', t'; xq, to). The delta-function, now unattended by deriva- 
tives, kills the integral, leaving 

P(x, t + St; x , t ) = P(x, t; x , t ) - 7^- 5xi )P(x, t; x , to)) 

1 d 2 
+-(dxi dxj ) — — P(x, t; x , to) + . . . (3.25) 

1 3 

Using our expressions for (5x) and (5x5x) given in (3.21) and (3.22), this becomes 

1 d fdV \ 

P(x, t + St; x , t ) = P(x, t; x , t ) + -7^7 ( P(x, t; x , t ) J St 

+D—P(x,t;x ,t )St + ... (3.26) 

But we can also get a much simpler expression for the left-hand side simply by Taylor 
expanding with respect to time, 

P(x , t + St; x , t ) = P(x, t; x , t ) + ^P(x , t; x , t ) St + . . . (3.27) 
Equating (3.27) with (3.26) gives us our final result, 

^ = Iv • (PW) + DV 2 P (3.28) 
at 7 

This is the Fokker-Planck equation. This form also goes by the name of the Smolu- 
chowski equation or, for probabilists, Kolomogorov's forward equation. 

Properties of the Fokker-Planck Equation 

It is useful to write the Fokker-Planck equation as a continuity equation 

W = V-J (3.29) 

where the probability current is 

j = -PW + DVP (3.30) 

7 

The second term is clearly due to diffusion (because there's a big capital D in front of 
it); the first term is due to the potential and is often referred to as the drift, meaning 
the overall motion of the particle due to background forces that we understand. 
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One advantage of writing the Fokker-Planck equation in terms of a current is that 
we see immediately that probability is conserved, meaning that if J d 3 x P = 1 at some 
point in time then it will remain so for all later times. This follows by a standard 
argument, 



where the last equality follows because we have a total derivative (and we are implicitly 
assuming that there's no chance that the particle escapes to infinity so we can drop the 
boundary term). 

The Fokker-Planck equation tells us how systems evolve. For some systems, such as 
those described by the diffusion equation, there is no end point to this evolution: the 
system just spreads out more and more. However, for generic potentials V there are 
time-independent solutions to the Fokker-Planck equation obeying V • J = 0. These 
are the equilibrium configurations. The solution is given by 



Using the Einstein relation (3.12), this becomes something very familiar. It is simply 
the Boltzmann distribution for a particle with energy V(x) in thermal equilibrium 



Isn't that nice! (Note that there's no kinetic energy in the exponent as we set m = 
as our starting point). 

An Application: Escape over a Barrier 

As an application of the Fokker-Planck equation, consider thermal escape from the 
one-dimensional potential shown in Figure 7. We'll assume that all the particles start 
off sitting close to the local minimum at x min . We model the potential close to this 
point as 



and we start our particles in a distribution that is effectively in local equilbrium (3.31), 
with 




P(x) 



V(x)hD 



P{x) ~ e 



V(x)/k B T 



(3.31) 





(3.32) 
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V(x)A 




Figure 7: Escape over a Barrier. 

But, globally, x min is not the lowest energy configuration and this probability distribu- 
tion is not the equilibrium configuration. In fact, as drawn, the potential has no global 
minimum and there is no equilibrium distribution. So this isn't what we'll set out to 
find. Instead, we would like to calculate the rate at which particles leak out of the trap 
and over the barrier. 

Although we're clearly interested in a time dependent process, the way we proceed is 
to assume that the leakage is small and so can be effectively treated as a steady state 
process. This means that we think of the original probability distribution of particles 
(3.32) as a bath which, at least on the time scales of interest, is unchanging. The steady 
state leakage is modelled by a constant probability current J = J , with J given by 
(3.30). Using the Einstein relation D = k B T/^, we can rewrite this as 

J = ML e -V(x)/k B T f e +V(x)/k B T p\ 

7 dx ^ ' 

The first step is to integrate Jo e +v<yX ^ kBT between the minimum x m i n and some distance 
far from all the action, x ^> x max , which we may as we call x = x*, 

^ J e v ^ h * T = ^ [e v ^ k * T P(x min ) - e v ^/ kBT P{ Xi )] 

" ^min ^ 

But we can take the probability P(x+) to be vanishingly small compared to P(x min ) 
given in (3.32), leaving us with 



Joe v^/k B T^kBT 

7 V 2ixk B T 
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Meanwhile, the integral on the left-hand-side is dominated by the maximum of the 
potential. Let's suppose that close to the maximum, the potential looks like 

V(x) ~ Vmax — -^m ax (x — X max ) 2 

Then we'll write the integral as 



Combining the two expressions (3.33) and (3.34), we get our final result for the rate of 
escape over the barrier 

j ^ ^min^max -y max /fe s T 
Z7T7 

3.2.3 Velocity Diffusion 

So far we've ignored the inertia term, setting m — 0. Let's now put it back in. We can 
start by setting the potential to zero, so that the Langevin equation is 

mx = —7a 7 + fit) 

But, we can trivially rewrite this as a first order equation involving v — x, 

mv = — r yv + f(t) 

This means that if we're only interested in the distribution over velocities, P(v, t), then 
we have exactly the same problem that we've just looked at, simply replacing x — > v 
and 7 — > m. (Actually, you need to be a little more careful. The diffusion constant 
D that appears in (3.28) was really D'-f 2 /'-/ 2 where the numerator arose from the noise 
correlator and the denominator from the 7a 7 term in the Langevin equation. Only the 
latter changes, meaning that this combination gets replaced by -D7 2 /m 2 ). The resulting 
Fokker-Planck equation is 

dP 19 D 1 2 dP\ 

-kt = -Tp • lPv + (3.35) 
ot mov \ m av J 

The equilibrium distribution that follows from this obeys dP/dt = 0, meaning 



dP m „ / m 



3/2 



dv Di \2%k B T 

where we've again used the Einstein relation = k B T. This, of course, is the Maxwell- 
Boltzmann distribution. 
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In fact, we can do better than this. Suppose that we start all the particles off at 
t — with some fixed velocity, v = vq. This mean that the probability distribution is 
a delta-function, P(v,t = 0) = S 3 (v — v ). We can write down a full time-dependent 
solution to the Fokker-Planck equation (3.35) with this initial condition. 

m \ 3 ^ 2 / m(v — ■u e~ 7 ^ m ) 2 



P&t) = o-,. m,, —^U^) ex P 



2nk B T(l - e-W/™); F V 2k B T{l - e-W/™) 

As t — > oo, we return to the Maxwell-Boltzmann distribution. But now this tells us 
how we approach equilibrium. 

The Kramers-Chandrasekhar Fokker-Planck Equation 

As our final example of a Fokker-Planck equation, we can consider the Langevin equa- 
tion with both acceleration term and potential term, 

m£ = - 7 f - W + /(*) 

Now we are looking for a probability distribution over phase space, P(x,x,i). The 
right way to proceed is to write this as two first-order equations. The first of these is 
simply the definition of velocity v — x. The second is the Langevin equation 

mi? = _ 7 tf_ vv + /(i) 

These can now be combined into a single Langevin equation for six variables. Once 
armed with this, we need only follow the method that we saw above to arrive at a 
Fokker-Planck equation for P(x,v,t), 

d d \ 1 d { dV\ D-f 2 d 2 P 

— + v 1 —— 1 P = — — — \iv l P + P— — I + (3 36) 

dt dx { ) mdv i \ dx i J m 2 dv i dv i v ' ; 

This form of the Fokker-Planck equations is sometimes called the Kramers equation 
and sometimes called the Chandrasekhar equation. 

Note that this equation is now capturing the same physics that we saw in the Boltz- 
mann equation: the probability distribution P(x, v, t) is the same object that we called 
fi(f,p; t) in Section 2. Moreover, it is possible to derive this form of the Fokker-Planck 
equation, starting from the Boltzmann equation describing a heavy particle in a sur- 
rounding bath of light particles. The key approximation is that in small time intervals 
St, the momentum of the heavy particle only changes by a small amount. Looking back, 
you can see that this was indeed an assumption in the derivation of the Fokker-Planck 
equation in Section 3.2.2, but not in the derivation of the Boltzmann equation. 
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Integrating over Velocity 

The equation (3.36) governing the probability distribution over phase space P(x,v,t) 
looks very different from the Fokker-Planck equation governing the probability distri- 
bution over configuration space (3.28). Yet the related Langevin equations are simply 
related by setting m = or, equivalently, looking at systems with large 7. How can we 
derive (3.28) from (3.36)? 

The computation involves a careful expansion of (3.36) in powers of I/7. Let's see 
how this works. Firstly, we use the Einstein relation to write D'-f = ksT, and the 
rearrange the terms in (3.36) to become 

8 fk B T8 , ^„ ua +vi 8_ L 8va\ p 



dv l \ m 2 dv l m J 7 \ dt dx 1 m dx % dv 
We're going to solve this perturbatively in I/7, writing 



p = p(0) + I p(l) + J_ p(2) + 



7 7 2 ~ 



As our first pass at this, we drop anything that has a I/7, which mean that P^ must 
be annihilated by the left-hand-side of (3.37). and This is a simple differential equation, 
with solution 

P^\v,x,t) = e~ mv2 ' 2kBT ^\x,t) 

for any function (f>^(x,t). Of course, the velocity dependence here is simply the 
Maxwell-Boltzmann distribution. To figure out what restrictions we have on 4>^\ we 
need to go to the next order in perturbation theory. Keeping terms of 0(1/7), the 
differential equation (3.37) becomes 

dv*\m*W + m) F ~ \dt dx 1 k B T dx i ) {6 ^ } 

But this equation cannot be solved for arbitrary (f>(°\ This is simplest to see if we just 
integrate both sides over J d 3 v: the left-hand-side is a total derivative and so vanishes. 
On the right-hand-side, only one term remains standing and this must vanish. It is 

So = With this constraint, the solution to (3.38) is, again, straightforward 

to write down. It is 

/*>(*,„,») = (-™.^ - j^is> -- m " 2,2tBT 
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At this point, it doesn't look like we're making much progress. We still don't know what 
is and we've now had to introduce yet another arbitrary function, <j>^(x, t) which 
carries all the time dependence. Let's plug this once more into (3.37), now working to 
order 0(l/7 2 ). After a little bit of algebra, the equation becomes 



d fk B T d ^ v 



dv l \ m 2 dv l m J 



\ p(2) 



dx i y 13 k B T" " J \dxi ' k B Tdxi 

mv 2 /2k B T 



id ■ d v l dV \ , m 



e 



Once again, there's a consistency condition that must be realised if this equation is 
to have a solution. Integrating over J d 3 v, the left-hand-side is a total derivative and 
therefore vanishes. Any term linear in v on the right-hand-side also vanishes. But so 
too do the terms on the second line: you can check that the Gaussian integral over the 
5ij term exactly cancels the v term. The resulting consistency condition is 

bT » (» 1 (3.39) 



dt dx l \dx % k B T dx l 

where the overall factor of k B T on the right-hand-side comes only arises when you do 
the Gaussian integral over J d 3 v. 

Now we're almost there. (Although it probably doesn't feel like it!). Collecting what 
we've learned, to order 0(1/7), the probability distribution over phase space takes the 
form 

But to make contact with the earlier form of the Fokker-Planck equation (3.28), we 
want a distribution over configuration space. We get this by simply integrating over 
velocities. We'll also denote the resulting probability distribution as P(x,t), with only 
the arguments to tell us that it's a different object: 



P(x, t) = J d 3 v P(x,v,t) = \j 2 -^^- (V 0) (x) + V 1} (x, t) 

But now we can use the consistency condition (3.39) to compute dP/dt. Working only 
to order 0(1/7), this reads 



dP k R T d ( d 1 dV 



dt 7 dx i \dx i k B Tdx\ 
Which is precisely the Fokker-Planck equation (3.28) that we saw previously. 
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3.2.4 Path Integrals: Schrodinger, Feynman, Fokker and Planck 

There is a close similarity between the Fokker-Planck equation and the Schrodinger 
equation in quantum mechanics. To see this, let's return to the first order Langevin 
equation 

'x = - (- W + pj (3.40) 

and the corresponding Fokker-Planck equation (3.28). We can change variables to 

P(x,t) =e- v{x)/2 ~ /D P(x,t) (3.41) 

Substituting into the Fokker-Planck equation, we see that the rescaled probability P 

obeys 

W = + W - ^(W) 2 ) P (3.42) 

There are no first order gradients VP; only V 2 P. This form of the Fokker-Planck 
equation looks very similar to the Schrodinger equation. 

^ = _^ 
at 2m 

All that's missing is a factor of % on the left-hand-side. Otherwise, with a few trivial 
substitutions, the two equations look more or less the same. Note, however, that 
the relationship between the potentials is not obvious: if we want to relate the two 
equations, we should identify 

u = h v2y ~ ^f [vvf {3A3) 

The relationship between the evolution of quantum and classical probabilities is also 
highlighted in the path integral formulation. Recall that the Schrodinger equation 
can be reformulated in terms of function integrals, with the quantum amplitude for a 
particle to travel from x = Xi at time t — ti to Xf at time tf given by 8 . 

(x f ,tf\xi,ti)=N J Vx{t) ex P U / dt ^~ U ^ 



8 A derivation of the path integral from the Schrodinger equation can be found in many places, 
including Section 4 of the notes, http://www.damtp.cam.ac.uk/user/tong/dynamics.html 
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Figure 8: From Chapman-Kolmogorov to Feynman. 

where Af is a normalization factor. Here the integral is over all paths which start at 
(xi,ti) and end at Xf,tf). By analogy, we expect there to be a similar path integral 
formulation of the classical probability for a particle in the Langevin environment (3.40) 
to travel from Xi to Xf. Indeed, the existence of a path integral formulation for this 
problem is very natural. The essence of this can already be seen in the Chapman- 
Kolmogorov equation (3.24) 

/+oo 
d 3 x >' P(x, t; x', t') P(x', t'; x , t ) 
-oo 

This simply says that to get from point A to point B, a particle has to pass through 
some position in between. And we sum up the probabilities for each position. Adding 
many more intervening time steps, as shown in Figure 8, naturally suggests that we 
should be summing over all possible paths. 

Deriving the Path Integral 

Here we will sketch the derivation of the path integral formula for the Fokker-Planck 
equation. We've already met function integrals in Section 3.1.4 where we introduced 
the probability distribution for a given noise function f(t) 

Prob[/(t)] = Afexp ^- J dt (3.44) 

subject to the normalization condition 

J Vf(t) Prob[/(f)] = 1 (3.45) 

But given a fixed noise profile f(t) and an initial condition, the path of the particle is 
fully determined by the Langevin equation (3.40). Let's call this solution Xf. Then the 



- 73 - 



probability that the particle takes the path Xf is the same as the probability that the 
force is /, 



Prob [£/(£)] = Prob [/(£)] = M exp f - J dt 



f(t) ■ f(t ) 
4D-f 2 



where, in the last line, we've used the Langevin equation (3.40) to relate the force to 
the path taken. But since this equation holds for any path Xf, we can simply drop the 
/ label. We have the probability that the particle takes a specific path x(t) given by 

Prob[f(t)] =Wexp (-4^2 J dt (7^+ VV) 2 ^j 

The total probability to go from Xi to Xf should therefore just be the sum over all these 
paths. With one, slightly fiddly, subtlety: the probability is normalized in (3.45) with 
respect to the integration measure over noise variable /. And we want to integrate over 
paths. This means that we have to change integration variables and pick up a Jacobian 
factor for our troubles. We have 

Piob[x f ,t f ;a! i ,t i ]=tf J Vf(t) exp(-^ J dt (7^ + W(£,) 2 ) 

= M J Vx(t) detM exp (-773^ J dt ( 7 f +VV) 2 ^j (3.46) 

Here the operator Ai(t,f) that appears in the Jacobian be thought of as 5f(t)/5x(t'). 
It can be written down by returning to the Langevin equation (3.40) which relates /and 

x, 

M(t, = 7 J^(t " + W<f(t - f) 

If we want to think in a simple minded way, we can consider this as a (very large) 
matrix Mtt>, with columns labelled by the index t and rows labelled by t'. We'll write 
the two terms in this matrix as M. = A + B so the determinant becomes 

det(A + B) = det A det(l + A^B) (3.47) 

The first operator A = / ~fd/dt5(t — t') doesn't depend on the path and its determinant 
just gives a constant factor which can be absorbed into the normalization M . The 
operator A~ l in the second term is defined in the usual way as 



/ 



dt'A(t,t')A-\t:t") = 5(t-t") 
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where the integral over J dt' is simply summing over the rows of A and the columns of 
A^ 1 as in usual matrix multiplication. It is simple to check that the inverse is simply 
the step function 

A-\t',t") = -6(t'-t") (3.48) 

7 

Now we write the second factor in (3.47) and expand, 

det(l + A^B) = exp (jr log(l + A^B^j = exp ( ^ Tr^ 1 B) n /nj (3.49) 

n 

Here we should look in more detail at what this compact notation means. The term 
TiA^B is really short-hand for 

TtA~ 1 B = J dtdt' A-*(t,lf)B(lf,t) 

where the integral over J dt' is multiplying the matrices together while the integral over 
f dt comes from taking the trace. Using (3.48) we have 

TrA^B = i J dtdt 1 6{t - t')V 2 V5(t -t) = 9 M- j dt VV 

The appearance of #(0) may look a little odd. This function is defined to be 9(x) = +1 
for x > and 6(x) = for x < 0. The only really sensible value at the origin is 
0(0) = 1/2. Indeed, this follows from the standard regularizations of the step function, 
for example 

What happens to the higher powers of (A^B)' 11 ? Writing them out, we have 
Tr^- 1 ^)" = Jdtjdh... dt 2n _ x 8(t - - t 2 )9(t 2 - t 3 )5(t 3 -U)... 

(V 2 V) n 



■ ■ ■ d{t 2n -2 — t 2n -l)5(t 2n -i — t) 



where we have been a little sloppy in writing (V 2 V) n because each of these is actually 
computed at a different time. We can use the delta-functions to do half of the integrals, 
say all the t n for n odd. We get 



Tr^- 1 ^)™ = J dt J dt 2 dU ■■■ 0(t- t 2 )0{t 2 - t 4 )9(U -t 6 )... 0(t 2n _ 2 - t) 



(V 2 V) n 

7 
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But this integral is only non-vanishing only if t > ti > £4 > t§ > . . . > t n > t. In 
other words, the integral vanishes. (Note that you might think we could again get 
contributions from 6(0) = 1/2, but the integrals now mean that the integrand has 
support on a set of zero measure. And with no more delta-functions to rescue us, gives 
zero. The upshot of this is that the determinant (3.49) can be expressed as a single 
exponential 

det(l + A- l B) = exp Q- j dt VV 

We now have an expression for the measure factor in (3.46). Using this, the path 
integral for the probability becomes, 

Probff/, = M' j Vx (t) ex p(~7F5^ / dt (7^+V\/) 2 + ^- J dtV 2 V^j 

= M > e [V {xf )-VW t D J Vx{t) exp (-Jdt ^ 

where U is given in (3.43). Notice that the prefactor e^ v ^ x ^~ v ^ Xi ^^ 2lD takes the same 
form as the map from probabilities P to the rescaled P in (3.41). This completes our 
derivation of the path integral formulation of probabilities. 

3.2.5 Stochastic Calculus 

There is one final generalization of the Langevin equation that we will mention but 
won't pursue in detail. Let's return to the case m — 0, but generalise the noise term 
in the Langevin equation so that it is now spatially dependent. We write 

7 f = -VV + b(x) fit) (3.50) 

This is usually called the non-linear Langevin equation. The addition of the b(x) multi- 
plying the noise looks like a fairly innocuous change. But it's not. In fact, annoyingly, 
this equation is not even well defined! 

The problem is that the system gets a random kick at time t, the strength of which 
depends on its position at time t. But if the system is getting a delta-function impulse 
at time t then its position is not well defined. Mathematically, this problem arises when 
we look at the position after some small time St. Our equation (3.20) now becomes 

1 1 rt+St 

S£ = £S t = vV5t + - dt' b(x(t')) f(t') 

7 7 Jt 

and our trouble is in making sense of the last term. There are a couple of obvious ways 
we could move forward: 
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• Ito: We could insist that the strength of the kick is related to the position of the 
particle immediately before the kick took place. Mathematically, we replace the 
integral with 

pt+8t pt+8t 
/ dt' b(x(t')) f(t') — > b(x(t)) / dt'f(t') 
Jt Jt 

This choice is known as Ito stochastic calculus. 

• Stratonovich: Alternatively, we might argue that the kick isn't really a delta 
function. It is really a process that takes place over a small, but finite, time. To 
model this, the strength of the kick should be determined by the average position 
over which this process takes place. Mathematically, we replace the integral with, 

/t+St I pt+St 

dt' b(x(t')) f(t') — > -[b(x(t + 5t))+b(x(t))] J dt'f(t') 

This choice is known as Stratonovich stochastic calculus. 

Usually in physics, issues of this kind don't matter too much. Typically, any way of 
regulating microscopic infinitesimals leads to the same macroscopic answers. However, 
this is not the case here and the Ito and Stratonovich methods give different answers 
in the continuum. In most applications of physics, including Brownian motion, the 
Stratonovich calculus is the right way to proceed because, as we argued when we first 
introduced noise, the delta-function arising in the correlation function {f{t)f(t')) is just 
a convenient approximation to something more smooth. However, in other applications 
such as financial modelling, Ito calculus is correct. 

The subject of stochastic calculus is a long one and won't be described in this course 9 . 
For the Stratonovich choice, the Fokker-Planck equation turns out to be 

BP 1 

-t£- = -V • LP(W - D^bVb)] + DV 2 (b 2 P) 
at 7 

This is also the form of the Fokker-Planck equation that you get by naively dividing 
(3.50) by b(x) and the defining a new variable y = x/b which reduces the problem to 
our previous Langevin equation (3.19). In contrast, if we use Ito stochastic calculus, 
the 6V6 term is absent in the resulting Fokker-Planck equation. 



9 If you'd like to look at this topic further, there are a number of links to good lecture notes on the 
course webpage. 
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